IMPLEMENTATION ARTICLE 


Related to other papers in this | 5 (p47); 20 (p199) 
special issue 


Addressing FAIR principles A1.2, R1.1 (R1.2) 


Ontology-based Access Control for FAIR Data 


Christopher Brewster'”*, Barry Nouwt', Stephan Raaijmakers' & Jack Verhoosel' 


‘Data Science Department at TNO, Kampweg 55, Soesterberg 3769 DE, The Netherlands 


*Institute of Data Science, Maastricht University, Maastricht 6229 ER, The Netherlands 


Keywords: Access control; Semantic technology; Ontology; Security; FAIR 


Citation: C. Brewster, B. Nouwt, S. Raaijmakers & J. Verhoosel. Ontology-based access control for FAIR data. Data Intelligence 
2(2020), 66-77. doi: 10.1162/dint_a_00029 


ABSTRACT 


This paper focuses on fine-grained, secure access to FAIR data, for which we propose ontology-based data 
access policies. These policies take into account both the FAIR aspects of the data relevant to access (such 
as provenance and licence), expressed as metadata, and additional metadata describing users. With this 
tripartite approach (data, associated metadata expressing FAIR information, and additional metadata about 
users), secure and controlled access to object data can be obtained. This yields a security dimension to the 
“A” (accessible) in FAIR, which is clearly needed in domains like security and intelligence. These domains 
need data to be shared under tight controls, with widely varying individual access rights. In this paper, we 
propose an approach called Ontology-Based Access Control (OBAC), which utilizes concepts and relations 
from a data set's domain ontology. We argue that ontology-based access policies contribute to data reusability 
and can be reconciled with privacy-aware data access policies. We illustrate our OBAC approach through 
a proof-of-concept and propose that OBAC to be adopted as a best practice for access management of 
FAIR data. 
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1. INTRODUCTION 


This paper focuses on the “accessible” aspect of the FAIR principles, arguing for the use of ontologies 
for fine-grained and secure access to FAIR data. In the twenty-five plus years since the creation of the World 
Wide Web, we have moved from a Web of documents to a vision for a Web of data [1, 2], especially open 
data and linked data [3]. The FAIR data principles represent the latest incarnation of a growing understanding 
of the implications of sharing data with other parties. While the “open data” movement represented a 
slightly utopian vision of the many possibilities of data sharing, the FAIR principles respect the reality of 
practical data sharing across different parties. For instance, the Accessible (“A”) dimension of FAIR data can 
be seen as a parameter that varies between very open to very closed. Mons et al. [4] underline that “The 
“A' in FAIR stands for ‘Accessible under well-defined conditions’” and mention personal privacy, national 
security and competitiveness as reasons for data to be shielded. 


The original home territory of the FAIR principles was life science research, where there is both a strong 
tradition of open databases and shared metadata principles [5], as well as a strong awareness of the 
sensitivity of certain types of data (like health or commercial data). Similarly, the law enforcement and 
security domains have long struggled with the need for “total information awareness” [6] (i.e., a 
philosophically similar concept as “open data”) versus a “need to know” perspective on data access and 
use. For some actors such as police forces access to certain types of data is strictly controlled depending 
on authorization, warrants and court permissions and if rules are not strictly followed court cases can 
founder on the technicality of the evidence acquisition processes. 


The FAIR principles have, to the best of our knowledge, not been applied routinely to data originating 
from sensitive sources, or data subject to confidentiality, privacy concerns, and security-related sharing 
restrictions. FAIR principles usually are applied to publicly available, open data, or carefully defined 
academic research data sets. And yet, law enforcement agencies (LEAs) typically gather data from a multitude 
of sources (including public domain data), and after further annotation, linking and enrichment, new 
confidentiality issues may arise. The resulting (meta)data would still demand data sharing and provenance 
tracking, but in a much more restricted fashion (see e.g. [7] for discussion of the data-centric world of law 
enforcement agencies and resulting issues with data sharing). 


To this end, in this paper, we address FAIR principle A1.2: The [accessibility] protocol allows for an 
authentication and authorization procedure, where necessary. We propose to use ontology-based protocols 
for accessing object data through metadata. This allows for fine-tuned access, which additionally contributes 
to FAIR principle R1.1: (Meta)data are released with a clear and accessible data usage license. 


Our primary goal is to utilize the concepts and relations from a domain ontology for data access control, 
in settings where data sharing is demanded but restricted. We propose Ontology-Based Access Control 
(OBAC) policies for accessing data on the basis of assigned metadata. OBAC is a new method that utilizes 
Semantic Web technologies and allows security policies to describe the information someone has access 
to irrespective of where this information resides. While not essential, an ontology based approach allows 
for easy integration with such FAIR principles as having unique identifiers (such as URIs) (F1), using open 
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and free protocols (A1) and guaranteeing a high level of conformity with the Interoperable principles (11-3). 
Prerequisites for OBAC are: 


Metadata needs to be assigned to raw object data prior to access. 

2). The metadata scheme adheres to an ontology: it is hierarchically structured, with meaningful 
(interpretable, semantic) relations between nodes (concepts) and reflects domain knowledge. 

3). Access to the object data occurs through the metadata, with the possibility of defining access for a 
given person or role to specific layers (strata) in the metadata. 

4). Access to object data and re-usability of (meta)data is determined by referring to the structure of the 
metadata graph, the contents of the nodes, or both. 


OBAC policies abstract from implementation details (e.g., resources like files or APIs) and thus are easier 
to reuse for new applications, or for integrating multiple information systems. The policies determine the 
information to which a user has (or has not) access to, based on the role of the user. 


With OBAC, multiple resources can be captured in a single access policy. This makes the specification 
and maintenance of security policies more systematic and easier to manage. A single policy could, for 
example, give a user access to all pieces of information of a certain type, instead of writing out a policy 
for every piece of information separately. OBAC further allows for policies that enable access to data that 
comply with certain patterns in the implemented ontology. With the underlying data represented in an RDF 
triple store, these patterns express access to a single specific triple®. Thus, access can be defined at a very 
fine-grained level, which contrasts with traditional, high level access policies that use “all-or-nothing” 
approaches, like user name/password mechanisms. In the next section, we apply the OBAC approach to a 
use case from the security domain. 


2. USE CASE 


In many law enforcement scenarios, regulated data access is of prime importance. Similar to a house 
search warrant, police officers only can gather and inspect certain data on the basis of justified suspicion, 
and a mandate for pursuing their search. As an example, blindly going through the dark Web marketplace 
activity of a particular vendor of interest without an “information search warrant” will not yield evidence 
that would stand up in court. Here, OBAC is helpful. OBAC depends on a graph-based representation of 
metadata following Linked Data or knowledge graph principles. Usually, these metadata graphs are 
expressed as taxonomies (expressing strictly vertical, semantic relations between concepts) or ontologies 
that allow for modelling both vertical and horizontal semantic relations between nodes. Exploiting semantic 
relations between nodes for data access allows for “meaningful” access protocols. 


® From a practical point of view, SPARQL-like graph patterns can easily be used to access the object data stored in the RDF 
triple stores. 


68 Data Intelligence 


202211.00190v1 


chinaXiv 


ChinaXiva ERAT 
Ontology-based Access Control for FAIR Data 


An example of a metadata graph is depicted in Figure 1, which is a fragment of the categorization of 
illicit goods in the Agora dark Web market, generated automatically from a public Agora data dump [8]®. 


Figure 1. Categorization of illicit goods in the Agora marketplace. 


Once raw data become linked to metadata graphs, structure-sensitive access protocols can effectively 
regulate views on the underlying raw data through traversal of the metadata graph. Possible restrictions 
involve semantic (ontology) labels on visiting nodes (“only party drugs are visible”), or topological aspects 
of the graph (such as the radius around nodes in the metadata graph that can be inspected by a given user). 
Even combinations of semantic and structure-based access restrictions are possible (“only party drugs 
statistics are visible, but not individual posts with vendor nicknames”). LEAs often start investigations with 
hunches. Information access protocols, backed by increasingly liberal investigation mandates, allow 
investigators to follow up on those hunches, by digging deeper in the data graph. We propose that OBAC 
is just the right approach to implementing such role- and context-dependent policies. 


Figure 2 depicts access to a metadata graph, where every access level “sees” the information of the level 
above it, and at its own stratum. Starting from the top level (“Drugs”, “Weapons”), with a metadata structure 
Mou 


like “Drugs/ecstasy”, “Drugs/cocaine”, one can descend down the hierarchy, and finally (at the lowest level) 
obtain access to raw object data. 


® Agora was a (now defunct) darknet marketplace for illicit goods cf. https://en.wikipedia.org/wiki/Agora_(online_marketplace). 
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Figure 2. Access to object data through structured metadata. 


3. PROOF OF CONCEPT 


We implemented our OBAC approach by combining the open source SPARQL server Apache Jena Fuseki 
and the AuthzForce implementation of XACML. Apache Jena Fuseki (part of the Apache Jena framework) 
offers a SPARQL endpoint for accessing an RDF graph database [9]. XACML is a standard under development 
since 2001 by OASIS [10] for attribute-based access control usually expressed in XML (although now also 
in JSON) providing a means among other things for the description of security policies. AuthzForce 
implements the XACML standard [11] and consists of an authorization policy engine and a RESTful 
authorization server. Technical details can be found in the code on Github®. We utilize Apache Jena’s 
extensible permissions layer® to intercept user SPARQL queries and send an appropriate decision request 
to AuthzForce’s Policy Decision Point (PDP). The PDP evaluates the decision request against XACML 
authorization policies. Furthermore, we have extended AuthzForce with our custom GraphPatternValue and 


© https://github.com/stephanraaijmakers/obac-fair-data. 
® https://jena.apache.org/documentation/permissions/index.html. 
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Graph Pattern ValueMatchFunction to allow XACML security policies to include SPARQL-like graph patterns 
and let AuthzForce match those to parts of the user SPARQL query and decide whether access is approved 
or rejected. 


Our proof of concept is based on the following, typical LEA scenario: 


Two narcotics police officers, one from the United States and one from Australia, receive a limited mandate 
to investigate a dark Web marketplace, but only for narcotics that both originate from and are shipped to 
their own country. The limited mandate only allows them to look at the title, origin and destination of items. 
The narcotics officers use their mandate to browse the data and collect facts that might be worth further 
investigation. If approved, they will be allowed to work with a broader mandate. The broader mandate 


allows them to hone in on a more specific class of drugs (Steroids) in the data set. 


We applied our approach to the aforementioned Agora dark Web marketplace data set, a CSV file with 
over 100,000 unique listings (rows) and 12 columns describing listings (Vendor, Category, Item, Item 
Description, Price, Origin, Destination, Rating and Remarks). We took the first 5,000 records from this data 
set and linked them to the Agora ontology shown above in Figure 1. Columns of the data set are mapped 
to the ontology in the following way; every row (i.e., marketplace item) becomes an individual of the Item 
class with the following properties. 


Object property assertions: 


e hasDestination: the destination country of the item. 
e hasOrigin: the origin country of the item. 

e hasVendor: the vendor of the item. 

e hasTopic: the topic (category) of the item. 


Data property assertions: 


e hasRating: the rating of the item. 

e hasPrice: the price of the item. 

e hasDescription: the description of the item. 
e hasRemark: the remark of the item. 

e hasTitle: the title of the item. 


This mapping allows us to convert the CSV Agora data set into RDF data using the OpenRefine® tool. 
Although OBAC does not restrict the location and format of the data under its control (i.e., its security 
policies only describe what information someone has access to in terms of the domain ontology), we 
transformed the Agora data set into a single RDF graph to simplify the experiment. We load this RDF data 
into an Apache Jena triple store and used the following security policies. 


®  http://openrefine.org/. 
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The security policies for this scenario involve the following four roles: 


- USA_drugs_officer_limited: the role for a drugs officer from the United States with a limited mandate. 
- AUS_drugs_officer_limited: the role for a drugs officer from Australia with a limited mandate. 

- USA_drugs_officer_broad: the role for a drugs officer from the United States with a broad mandate. 
- AUS_drugs_officer_broad: the role for a drugs officer from Australia with a broad mandate. 


One of the graph patterns for the role USA_drugs_officer_limited looks like this®: 


2item dwochasTitle ?object . 

IF 

2item rdf:type dwo:ltem . 

2item dwo:hasDestination dwd:USA . 
łitem dwo:hasOrigin dwd:USA . 

2item dwo:hasTopic ?topic . 

2topic rdf:type ?topicType . 

2topicType rdfs:subClassOf+ dwo:Drugs . 


This graph pattern should be read as: 


This role gives access to the title of an item, if the item has USA as its destination and origin country, 
and is drugs related. 


The above policy will be repeated for the other properties apart from “title” to which this role should 
have access. Once these security policies are in place, the USA drugs officer can inspect the data set safely 
without risking trespassing the mandate. The officer might be interested in all items he has access to and 
composes the following SPARQL query: 


SELECT ?item ?¢title ¢topic 
WHERE { 

item a dwo:ltem . 

2item dwo:hastTitle ?title . 

łitem dwo:hasTopic ?topic . 


} 


® For readability we included prefixes (where the prefix dwo stands for dark web ontology and dwd for dark web data) and 
the IF keyword; these are not part of the actual implementation. 
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The above query gives the following example results (Table 1). 


Table 1. Example results of SPARQL queries. 


Item Title Topic 
dwd:item/124 4-FMA...!! 1 KG dwo:Drugs/RCs 
dwd:item/2423 Ephedrine HCL 8mg (50x Pills 400mg Total) dwo:Drugs/Weight_loss 


dwd:item/2561 ORBIS PHARMA LABS - TEST C 10ML VIAL X 1 - Testosterone dwo:Drugs/Steroids 
Cypionate - $67 


Note: These results only include items that have USA as their origin and destination, even though this is not explicitly mentioned 
in the SPARQL query. 


From this data browsing session, the officer finds hints that especially the Steroids related drugs are 
increasing in the USA region. To further investigate this, the officer gets another role with a broader mandate 
that is needed to access vendor information. The graph pattern for the role USA_drugs_officer_broad that 
reflects this broader mandate is as follows: 


2item ?predicate ?object . 

IF 

litem rdf:type dwo:ltem . 

2item dwo:hasDestination dwd:USA . 
łitem dwo:hasOrigin dwd:USA . 

2item dwo:hasTopic ?topic . 

2topic rdf:type ¢topicType . 

2topicType rdfs:subClassOf+ dwo:Steroids . 


This graph pattern can be read as: 


This role gives access to all properties of an item, if the item has USA as its destination and origin country, 
and its topic is steroids related. 


With the broader mandate the officer can follow up, and submit the following query: 


SELECT ?item ?title ?vendor 
WHERE { 

2item a dwo:ltem . 

łitem dwo:hasTitle ?@title . 

litem dwo:hasVendor ?vendor . 


} 
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This query produces the following result (Table 2). 


Table 2. Example results of SPARQL queries. 


Item Title Vendor 
dwd:item/2561 ORBIS PHARMA LABS - TEST C 10ML VIAL X 1 - Testosterone —dwd:InsideTheWhale 
Cypionate - $67 
dwd:item/2563 Nordicor Trenbolone Enanthate 200mg/ml 10 vial dwd:pharmacy_land 
dwd:item/2578 Rip 300 10ml multi injection vial dwd:cerberus 


Table 2 lists user specific (vendor) names. This demonstrates that the use of OBAC security policies as 
metadata to the Agora data set, makes the Agora data set more FAIR by providing an explicit and machine 
readable description of the allowed access and show that our proof-of-concept implementation can enforce 
them. 


4. RELATED WORK 


Attribute based access control (ABAC) has become the standard approach to providing differential access 
to data in databases [12, 13]. There exists a body of work using ontologies and semantic principles to 
simplify and systematize the ABAC approach. Reference [14] focuses on using ontologies to simplify 
XACML, while [15] and [16] both use OWL to provide access control mechanisms. OBAC differs from 
these approaches in that it does not use an ontology to describe the users, roles and permissions, but it 
uses the domain ontology to specify the exact resources (or information) a particular user has access to or 
not. Reference [17] is an example where the ontology is used for both these purposes and this is also part 
of the future work for OBAC. Furthermore, the other approaches focus on access control to Web services, 
while our approach is independent of the location of the data. Reference [18] proposes OBDA, Ontology- 
Based Data Access, which emphasizes data consistency through metadata schemes and data content. Our 
OBAC approach is complementary to OBDA. 


Previous work on the application of the FAIR principles in the context of restricted access to data has 
been limited. Seemingly operating under a misunderstanding of the FAIR principles, the research in [19] 
proposes an “extension” to the existing FAIR principles to specifically accommodate the needs of sensitive 
clinical data where privacy needs to be maximized. The authors’ approach is particularly influenced by the 
requirements of the GDPR. In a similar spirit, the authors in [20] propose defining specific FAIR data types 
to capture privacy aspects in the metadata together with a coherent data access governance framework. 
More generally, the OMETA system of [21] exploits metadata for the configuration, capture, inspection and 
distribution of biomedical data. 
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5. CONCLUSION AND FUTURE WORK 


We have outlined a technical approach to applying ontology-based authentication protocols to structured 
data. Exploiting the structure of the applied ontology, our approach allows for data access control, by 
assigning semantic or topological restrictions to users, based on individual or group-based policies. Our 
work targets the FAIR principles A1.2 and R1.1, and thus affects both data access authentication and data 
usage. In interactive scenarios, where data users can modify or assign metadata themselves, our methods 
also can be extended to fine-grained metadata provenance control (FAIR principle R1.2). The approach we 
propose is entirely complementary with Landi et al. [22] and provides a way to operationalize the requisite 
access controls. The policy-based approach we espouse can be adapted to fit with legal requirements such 
as GDPR or even the protection of copyright in some circumstances [23]. 


The OBAC approach is currently being evaluated in on-going research to enable European LEA’s to 
inspect dark Web data. We intend to apply OBAC to other domains as well, including health. Future work 
will address systematic descriptions and implementations of different ontology-based access policies, 
including ones that restrict information access based on navigational limits, or more semantics-oriented 
policies described by an ontology. Furthermore, we need to model users using ontologies thereby combining 
both role-based and attribute-based access control methods. Adding methods for user interaction with 
implemented policies, like negotiation-based policy bypassing (e.g., trading one restriction in for another) 
or quota-based data access will make OBAC more user-oriented. 
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